Introduction: The diverse pathways that are deregulated during the malignant transformation of B-cells have been identified for many of the common mature B cell neoplasms, but a comprehensive description of the driver mutations that alter the function of proteins in these pathways remains incomplete. Specific variants may be distinguishing features of disease or largely restricted to disease subtypes, while other variants may be more widespread between subtypes. Mutations that are prevalent in certain immune cancers can be found at a lower frequency in other subtypes, which can affect the relevance of targeted therapeutics. Manual curation of hotspots is crucial to minimize noise and enhance statistical power for detecting hotspots present at lower frequencies, as certain regions in the genome are more susceptible to sequencing artifacts and data sourced from multiple cohorts can be prone to batch effects. Through the integration of mutational variants between Burkitt lymphoma (BL) and diffuse large B-cell lymphoma (DLBCL), and by employing manual validation of mutational variants to reduce sequencing noise, it is possible to identify novel mutational hotspots within the two subtypes.

Methods: Mutational hotspot analyses were performed on a combined cohort consisting of BL and DLBCL samples that had undergone either whole genome sequencing (644 samples) or whole exome sequencing (1996 samples), for a total of 2640 samples. Samples were sourced from a combination of in-house and external datasets, comprising 28 datasets in total. Simple somatic mutations (SSMs) were identified using four variant callers: Strelka2, MuTect2, SAGE, and LoFreq. Mutational hotspots and significantly mutated genes (SMGs) were identified from coding region variants with HotMAPS, OncodriveCLUSTL, and OncodriveFML using a consensus approach of 2/3 tools. The curation of hotspots is performed by manual inspection of hotspot loci from the automated generation of IGV snapshots.

Results: A large number of SMGs and hotspots were identified due to the size of our analysis cohort. HotMAPS identified 180 genes at a q-value of 0.01, OncodriveCLUSTL identified 219 genes at a q-value of 0.001, and OncodriveFML identified 105 genes at a q-value of 0.01. A total of 106 SMGs were identified by at least two tools, with 35 SMGs identified by all three tools (Figure 1). Among these genes, only 10 have not yet been incorporated in the Cancer Genome Census which may be used to estimate the quality of driver genes returned by these tools. Comparing to our curated list of genes previously associated with either BL or DLBCL, 39 of the genes with two votes and only 2 of the 35 genes with three votes are novel. HotMAPS and OncodriveCLUSTL identified a total of 207 and 254 hotspots, of which 109 and 110 hotspots were identified in the genes called by two or more tools, respectively, with overlapping HotMAPS and OncodriveCLUSTL hotspots occurring in 65 genes. Of the coding SSMs occurring in genes with two votes, 13% of hotspot mutations (HSMs) existed in both HotMAPS and OncodriveCLUSTL hotspots (Figure 2). Within the 207 hotspots called by HotMAPS, 177 (86%) were supported by variants from genome and capture samples, with the remaining 30 hotspots (14%) being exome-specific. Within these 207 hotspots, 160 (77%) were identified in both matched and unmatched-normal samples, while 47 (23%) were only identified in unmatched samples. Of the 254 hotspots identified by OncodriveCLUSTL, 191 (75%) were supported by both genome and capture samples and 63 (25%) were exome-specific. Both matched and unmatched samples contributed to 151 (59%) of the OncodriveCLUSTL hotspots and 103 hotspots (41%) were found only in unmatched samples.

Conclusions: All but two of the 35 genes identified by all three tools have been previously associated with DLBCL or BL as driver genes, which highlights the robustness of using this suite of tools to identify novel driver genes and mutational hotspots. Out of the 106 genes identified by at least two tools, 39 have not yet been classified as driver genes and may be novel NHL drivers. Manual inspection of the hotspots identified by HotMAPS and OncodriveCLUSTL is performed to expand curated blacklists and whitelists of mutational hotspots to minimize the impact of low-quality variants and improve power for detecting hotspots with lower mutation frequencies.

Rushton:SAGA Diagnostics: Current Employment.

Sign in via your Institution